Speech Recognition Supported by Prosodic Information for Fixed Stress Languages
نویسندگان
چکیده
In our paper we examine the usage of prosodic features in speech recognition, with a special attention payed to agglutinating and fixed stress languages. The used prosodic features, acoustic-prosodic preprocessing, and segmentation in terms of prosodic units are presented in details. We use the expression ”prosodic unit” in order to make a difference from prosodic phrases, which are longer. We trained a HMM-based prosodic segmenter reliing on fundamental frequency and intensity of speech. The output of the prosodic segmenter is used for N-best lattice rescoring in parallel with a simplified bigram language model in a continuous speech recognizer, in order to improve speech recognition performance. Experiments for Hungarian language show a WER reduction of about 4% using a simple lattice rescoring.
منابع مشابه
Using prosody to improve automatic speech recognition
In this paper acoustic processing and modelling of the supra-segmental characteristics of speech is addressed, with the aim of incorporating advanced syntactic and semantic level processing of spoken language for speech recognition/understanding tasks. The proposed modelling approach is very similar to the one used in standard speech recognition, where basic HMM units (the most often acoustic p...
متن کاملAutomatic Annotation of Speech Corpora for Prosodic Prominence
This paper presents a study on the automatic detection of prosodic prominence in continuous speech, with particular reference to American English, but with good prospects of application to other languages. Perceptual prosodic prominence is supported by two different prosodic features: pitch accent and stress. Pitch accent is acoustically connected with fundamental frequency (F0) movements and o...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملAn evaluation of keyword spotting performance utilizing false alarm rejection based on prosodic information
In this paper, we describe our effort in developing new method of false alarm rejection for keyword spotting type of speech recognition system. This false alarm rejection uses prosodic similarities, and works as posterior rescore basis. In keyword spotting, there is always false alarm problem. Here, we propose a technique to reject those false alarms using prosodic features. In Japanese, prosod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007